Rising inequality and increased privatization of space in urban landscapes is bringing attention to some of the only public spaces left: libraries. This study analyzes to what extent library service areas differ along lines of inequality like race, class, etc. This study will delineate library catchment areas in Chicago, IL and compare them with socio-economic data at the tract and block level. This analysis is the first part of a two pronged methods that aims to answer the question of to what extent the catchment areas are distinct.
Key words: public space, libraries, population weighted
aggregation, service areas, demographicsSubject: Social and Behavioral Sciences: Geography:
Human GeographyDate created: 11/28/2023Date modified: 12/5/2023Spatial Coverage: Chicago, ILSpatial Resolution: Census Tracts, Census Blocks,
Library Service AreasSpatial Reference System: EPSG:32616Temporal Coverage: 2017-PresentTemporal Resolution: Specify the temporal resolution of
your study—i.e. the duration of time for which each observation
represents or the revisit period for repeated observationsThis study is a reproduction of my own an original study. As part of my independent research work with Professor Peter Nelson, I created a workflow in QGIS to answer the question: How do library service catchment areas differ along lines of race, class, gender, etc? In order to streamline this research and make it reproducible/replicable I decided to reproduce the workflow in R and create a research compendium for it as part of my final independent project in GEOG0361: Open GIScience.
This research aims to answer the following two questions. How do library service catchment areas differ along lines of race, class, gender, etc. How do the public services in these catchment areas reflect the nature of their local constituents?
# record all the packages you are using here
# this includes any calls to library(), require(),
# and double colons such as here::i_am()
packages <- c(
"tidycensus", "tidyverse", "sf", "classInt", "readr", "tigris",
"rgdal","rstudioapi", "here", "s2", "pastecs", "tmap", "knitr",
"kableExtra", "broom", "leaflet", "usethis", "deldir", "spatstat"
)
# force all conflicts to become errors
# if you load dplyr and use filter(), R has to guess whether you mean dplyr::filter() or stats::filter()
# the conflicted package forces you to be explicit about this
# disable at your own peril
# https://conflicted.r-lib.org/
require(conflicted)
## Loading required package: conflicted
# load and install required packages
# https://groundhogr.com/
if (!require(groundhog)) {
install.packages("groundhog")
require(groundhog)
}
## Loading required package: groundhog
## Attached: 'Groundhog' (Version: 3.1.2)
## Tips and troubleshooting: https://groundhogR.com
if(!require(here)){
install.packages("here")
require(here)
}
## Loading required package: here
## here() starts at C:/Users/azalecki/Documents/GitHub/Zalecki-2023
# this date will be used to determine the versions of R and your packages
# it is best practice to keep R and its packages up to date
groundhog.day <- "2023-06-26"
set.groundhog.folder("../../data/scratch/groundhog/")
## [36mThe groundhog folder already was '../../data/scratch/groundhog/'[0m
# this replaces any library() or require() calls
groundhog.library(packages, groundhog.day)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.2 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.2 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.1
## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
## Loading required package: sp
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
## Please note that rgdal will be retired during October 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-7, (SVN revision 1203)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.6.2, released 2023/01/02
## Path to GDAL shared files: C:/Users/azalecki/AppData/Local/R/win-library/4.3/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 9.2.0, March 1st, 2023, [PJ_VERSION: 920]
## Path to PROJ shared files: C:/Users/azalecki/AppData/Local/R/win-library/4.3/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.6-1
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
##
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
##
## first, last
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
## deldir 1.0-9 Nickname: "Partial Distinction"
##
## The syntax of deldir() has changed since version
## 0.0-10. In particular the "dummy points" facility
## (which was a historical artifact) has been removed.
## In the current version, 1.0-8, an argument "id" has
## been added to deldir(). This new argument permits the
## user to specifier identifiers for points. The default
## behaviour is to continue using the indices of the
## points to identify them. In view of the fact that
## point identifiers may be user-supplied, the arguement
## "number", in plot.deldir() and plot.tile.list(), has
## had its name changed to "labelPts", and the argument
## "nex" in plot.deldir() has had its name changed to
## "lex". In addition the name of the forth component
## of the "cmpnt_col" argument in plot.deldir() has been
## changed from "num" to "labels". There is a new
## function getNbrs(), and the function tileInfo() has
## been modified to include output from getNbrs().
## Please consult the help.
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 3.2-1
## Loading required package: spatstat.random
## spatstat.random 3.1-5
## Loading required package: spatstat.explore
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## spatstat.explore 3.2-1
## Loading required package: spatstat.model
## Loading required package: rpart
## spatstat.model 3.2-4
## Loading required package: spatstat.linnet
## spatstat.linnet 3.1-1
##
## spatstat 3.0-6
## For an introduction to spatstat, type 'beginner'
## [36mSuccessfully attached 'tidycensus_1.4.1'[0m
## [36mSuccessfully attached 'tidyverse_2.0.0'[0m
## [36mSuccessfully attached 'sf_1.0-13'[0m
## [36mSuccessfully attached 'classInt_0.4-9'[0m
## [36mSuccessfully attached 'readr_2.1.4'[0m
## [36mSuccessfully attached 'tigris_2.0.3'[0m
## [36mSuccessfully attached 'rgdal_1.6-7'[0m
## [36mSuccessfully attached 'rstudioapi_0.14'[0m
## [36mPreviously attached 'here_1.0.1'[0m
## [36mSuccessfully attached 's2_1.1.4'[0m
## [36mSuccessfully attached 'pastecs_1.3.21'[0m
## [36mSuccessfully attached 'tmap_3.3-3'[0m
## [36mSuccessfully attached 'knitr_1.43'[0m
## [36mSuccessfully attached 'kableExtra_1.3.4'[0m
## [36mSuccessfully attached 'broom_1.0.5'[0m
## [36mSuccessfully attached 'leaflet_2.1.2'[0m
## [36mSuccessfully attached 'usethis_2.2.1'[0m
## [36mSuccessfully attached 'deldir_1.0-9'[0m
## [36mSuccessfully attached 'spatstat_3.0-6'[0m
# you may need to install a correct version of R
# you may need to respond OK in the console to permit groundhog to install packages
# you may need to restart R and rerun this code to load installed packages
# In RStudio, restart r with Session -> Restart Session
# record the R processing environment
# alternatively, use devtools::session_info() for better results
writeLines(
capture.output(sessionInfo()),
here("procedure", "environment", paste0("r-environment-", Sys.Date(), ".txt"))
)
# save package citations
knitr::write_bib(c(packages, "base"), file = here("software.bib"))
# set up default knitr parameters
# https://yihui.org/knitr/options/
knitr::opts_chunk$set(
echo = FALSE, # Run code, show outputs (don't show code)
fig.retina = 4,
fig.width = 8,
fig.path = paste0(here("results", "figures"), "/")
)
#set up Github repository as the R project
#use_github("azalecki/Zalecki-2023")
Each of the next subsections describes one data source. Secondary data sources for the study are to include the following:
## Retrieving data for the year 2021
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
I will add a more comprehensive list of variables as my senior research project progresses but in this code I will be working with one table: Household Income. I initially attempted to create a data table with variables from several tables but due to my very rudimentary skills in R, I was not successful in doing so. In order to keep moving forward with the code I had to simplify the project a little bit and work with only one of the data tables.
## To install your API key for use in future sessions, run this function with `install = TRUE`.
## Getting data from the 2017-2021 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Loading ACS5 variables for 2021 from table B19001 and caching the dataset for faster future access.
Data for Chicago Public Library locations comes in CSV format with coordinate data. Prior to uploading the CSV file into the Github site I used Microsoft Excel to manually seperate the Longitude and Latitude values into two separate columns. No other data manipulation was done in Excel.
## Rows: 81 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): NAME, HOURS OF OPERATION, ADDRESS, CITY, STATE, PHONE, WEBSITE
## dbl (8): ZIP, Latitude, Longitude, Boundaries - ZIP Codes, Community Areas, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## [1] "sf" "tbl_df" "tbl" "data.frame"
## [1] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [13] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [25] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [37] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [49] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [61] POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT POINT
## [73] POINT POINT POINT POINT POINT POINT POINT POINT POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
Because, the ACS data tables do not come with population data I have to bring in population data seperately. My intentions for this project include a population weighted aggregation so I will be using smaller block level data to more accurately estimate population distribution.
## Getting data from the 2020 decennial Census
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Loading PL variables for 2020 from table P1 and caching the dataset for faster future access.
## Using the PL 94-171 Redistricting Data Summary File
## Using the PL 94-171 Redistricting Data Summary File
## Note: 2020 decennial Census data use differential privacy, a technique that
## introduces errors into data to preserve respondent confidentiality.
## ℹ Small counts should be interpreted with caution.
## ℹ See https://www.census.gov/library/fact-sheets/2021/protecting-the-confidentiality-of-the-2020-census-redistricting-data.html for additional guidance.
## This message is displayed once per session.
Chicago Shapefile
American Community Survey(ACS) Demographic Data
Public Library Locations
Population Data and Census Blocks for Cook County, IL
Edge/shape effects when creating polygons to represent library service/catchment areas
Visualizing catchment areas for libraries is my first objective because, unlike primary schools that have definite attendance boundaries, libraries do not have proper “service areas.” In the past, Thiessen/Voronoi polygons have been used to map catchment or service areas by proximity to points. As explained by Flitter et al(nd), GIS tools that generate Thiessen polygons draw shapes around a layer of point data where every location within one shape is nearer to its center point than all other points in the layer. These proximal regions assume that people are more likely to visit the library closest to them and as a result library services should reflect their local constituents. I recognize that this method has its flaws because this is not always the case. Some people may frequent libraries outside of their residential neighborhood for a variety of reasons and there is no way of accurately tracking that. The other option would be to draw buffers around library points like in the method we saw in the Kang et al. (year) study or calculate a network analysis. Thiessen polygons are, however, the simpler and computationally less intense option to a full-on network analysis. Although they might seem arbitrary I have attempted to improve the validity by including a population-weighted aggregation to more accurately estimate the neighborhood characteristics of the library service areas.
The ACS classifies the data it collects in its own way but I wanted to reclassify it into bins to serve my purposes. I created bins/ simpler classifications for the Household Income ACS data.
| Variable Name in Study | Study Label | Variable Used from ACS Data | ACS Label |
|---|---|---|---|
| hhi1 | under 25k | B19001_002E | Less than $10,000 |
| B19001_003E | $10,000 to $14,999 | ||
| B19001_004E | $15,000 to $19,999 | ||
| B19001_005E | $20,000 to $24,999 | ||
| :———————-: | :————-: | :————————————————–: | :———————: |
| hhi2 | 25k - 49.9k | B19001_006E | $25,000 to $29,999 |
| B19001_007E | $30,000 to $34,999 | ||
| B19001_008E | $35,000 to $39,999 | ||
| B19001_009E | $40,000 to $44,999 | ||
| B19001_010E | $45,000 to $49,999 | ||
| :———————-: | :————-: | :————————————————–: | :———————: |
| hhi3 | 50k - 74.9k | B19001_011E | $50,000 to $59,999 |
| B19001_012E | $60,000 to $74,999 | ||
| :———————-: | :————-: | :————————————————–: | :———————: |
| hhi4 | 75k - 99.9k | B19001_013E | $75,000 to $99,999 |
| :———————-: | :————-: | :————————————————–: | :———————: |
| hhi5 | 100k - 149.9k | B19001_014E | $100,000 to $124,999 |
| B19001_015E | $125,000 to $149,999 | ||
| :———————-: | :————-: | :————————————————–: | :———————: |
| hhi6 | 150k - 199.9k | B19001_016E | $150,000 to $199,999 |
| :———————-: | :————-: | :————————————————–: | :———————: |
| hhi7 | over 200k | B19001_017E | $200,000 or more |
Code for other tables that I will be working with later.
I took one of the ACS tables and selected for the necessary geographic identifiers (STATEFP, COUNTYFP, TRACTCE, GEOID, NAME.X, ALAND, AWATER, geometry) and the source fields I had created in the last step. For all of the following tables I just selected for the source fields I had created because I will be doing a spatial join and selecting for the geographic fields would be redundant.
I clipped the final table by the Chicago geometry as to only include tracts that are within Chicago’s city boundaries.
To create the catchment areas I will create Thiessen/Voronoi polygons from the library points.